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Summary. In linear inversion of a finite-dimensional data vector y to estimate a finite-dimensional prediction 
vector z, prior information about Xg is essential if y is to supply useful limits for z. The one exception occurs 
when all the prediction functionals are linear combinations of the data functionals. 

We compare two forms of prior information: a ’’soft" bound on x E is a probability distribution px on X 
which describes the observer's opinion about where x E is likely to be in X ; a "hard" bound on x £ is an inequal- 
ity Qx (*e . ^ 1. where Qx is & positive definite quadratic form on X . A hard bound Qx can be "softened" to 

many different probability distributions p x , but all these px ’s carry much new information about x s which is 
absent from Q x , and some information which contradicts Q x . For example, all the p x *s give very accurate esti- 
mates of several other functions of x E besides Qx(Sg, *£)• And all the p x '& which preserve the rotational sym- 
metry of Qx assign probability 1 to the event ) = °°- Both stochastic inversion (SI) and Bayesian infer- 

ence (BI) estimate z from y and a soft prior bound px- If that probability distribution was obtained by softening 
a hard prior bound Qx, rather than by objective statistical inference independent of y, then p x contains so much 
unsupported new "information" absent from Q x that conclusions about z obtained with SI or BI would seem to 
be suspect 
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1. Introduction 

Most geophysical inverse problems require prior information for their solution (Backus, 1970a; 
Franklin, 1970), information known to the observer before he obtained the data to be inverted. 
That prior information is often cast in the form of a probability distribution p x on the linear space 
X of possible earth models x, but it can also take the form of one or more bounds on the correct 
earth model x E . These bounds are usually linear or quadratic. Linear bounds take the form 
a <*f (x e )<LA , where a and A are known real numbers and f :X — »/? is a known real-valued 
linear function on X (R is the real line). Positivity constraints on the density are an example of 
linear bounds. Quadratic bounds take the form 

Qx( x e» x e)-1 ( 1 . 1 ) 

where Qx is a known positive-definite quadratic form on X. That is, f2x( x t>*2) * s a real number 
which depends linearly on each of Xj and x 2 when the other is fixed; and Q x (x t. x t ) = Qx ( x 2> x i)'> 
and Qx (x, x) > 0 unless x=0. Energy constraints are examples of (1.1). Jackson (1979) calls the 
probability distributions "soft" bounds on x £ , and the inequalities "hard" bounds. 

There are two kinds of soft bounds, subjective and objective. A subjective soft bound is a 
probability distribution p x on X which represents an observer's subjective personal opinion about 
where x E is likely to be in X. This p x might be obtained by "softening" a hard quadratic bound 
(1.1) when the observer is unwilling to adopt (1.1) with certainty. Then he could replace (1.1) by 
a gaussian with mean 0 and variance tensor Qx 1 . Hard linear bounds can also be softened (Jack- 
son, 1979). The observer’s ability to persuade his colleagues to accept his use of a subjective soft 
bound to invert the data will depend on his ability to persuade them to share his prior personal 
probability distribution. 

An objective soft bound is a probability distribution p x on X which models a realizable 
population of possible models x. Such a p x might be estimated by repeatedly drawing random 
samples x from X, as in the analysis of a stationary time series, or the aiming strategy of an 
antiaircraft weapon in a protracted war. Alternatively, an objective soft bound might come from 
a theory of the source of the models: a complete theory of the geodynamo might provide a proba- 
bility distribution for the gauss coefficients of the geomagnetic field at the core-mantle boundary. 
Finally, an objective soft bound might appear as a hypothesis to be tested: peihaps the paleomag- 
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netic data can be fitted to a statistical model which treats the gauss coefficients as uncorrelated 
gaussian random variables (C. Constable and R. Parker, private communication; they see evi- 
dence for some correlations). 

Hard linear bounds can be incorporated directly into a geophysical inversion by means of 
linear programming (Dantzig, 1963; Heustis & Parker, 1977). Hard quadratic bounds can be 
incorporated by the method which we will call "hard quadratic inversion," HQI (Backus, 1970a). 
Subjective soft bounds are best treated by Bayesian inference, BI, which is directly concerned 
with how an observer alters his personal probability distribution for x E when he learns of new 
data with known error statistics. Backus (1987) gives references to some of the early work on BI, 
which goes back to Bayes (1764). Objective soft bounds are best treated by stochastic inversion, 
SI, which applies a minimum-variance linear estimator to the data vector y (Franklin, 1970; Jack- 
son, 1979). The idea is to find the linear mapping H :Y ->X from the data space Y to the model 
space X which is statistically best for estimating a model x from its observed data vector y as 
H( y). More precisely, H is Chosen to minimize the expected value of the squared distance from 
x to H(y) in a long series of trials, model vectors x being drawn at random from X according to 
p x , and their data vectors y being observed and used to estimate x. Both p x and the error statis- 
tics of y contribute to H . 

Some observers (Backus, 1987) take the view that stochastic inversion is inappropriate 
when there is only one correct earth model x £ , and p x is a prior personal probability distribution, 
a subjective soft bound. Bayesian inference seems to be the proper procedure here. Others 
disagree (Jackson, 1979). Fortunately, Bayesian inference and stochastic inversion lead to the 
same result when p x and the statistics of the errors in the data are gaussian (Backus, 1987), so in 
that case there is no need to choose between SI and BI. 

Bayesian inference (and, for some observers, stochastic inversion) can be used with hard 
prior bounds if those bounds are first softened to subjective soft prior bounds (Backus, 1970b; 
Jackson, 1979; Gubbins, 1983). 

It is the thesis of the present paper that the relationship between hard and soft bounds is not 
as simple as their names would lead one to expect, and that neither Bayesian inference nor sto- 
chastic inversion is appropriate for incorporating a hard quadratic prior bound into a data inver- 
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sion. If a single inequality (1.1) is really all the prior information the observer wants to use, he 
must confine himself to hard quadratic inversion. Use of BI and SI introduces new information 
not contained in (1.1), and this information makes very precise quantitative claims about x £ 
which come entirely from the bound-softening process and are independent of the observed data. 
We illustrate the problem with the example of continuing the geomagnetic field B down to the 
core-mantle boundary (CMB). Here, softening the hard heat flow bound (Gubbins, 1983; Backus, 
1987) or the hard energy bound (Backus, 1987) makes a priori claims about the gauss 
coefficients of B at the CMB which many workers in geomagnetism would find preposterous. 

We have not investigated the softening of hard linear bounds, and make no comments on 
that subject. In a later paper we will extend the discussion of hard quadratic inversion begun by 
Backus (1970a). Preliminary calculations indicate that in many cases estimates of the correct 
model x £ will not be very different in HQI from those obtained by correctly executed BI or SI, 
and that the error estimates in HQI may be larger than those of BI or SI by a factor of the order 
two. Most of the published Bayesian and stochastic inversions are simply regularizations, the 
"prior" information being inferred from the data to be inverted. Therefore, those inversions pro- 
duce physically acceptable models which fit the data within the expected data errors, but such 
inversions cannot support the error estimates for x £ reported by their users (Backus, 1987). 
Defensible error estimates on x E are not yet available in these inversions. If the prior information 
is only a hard quadratic bound (1.1) then those error estimates must come from hard quadratic 
inversion, not stochastic inversion or Bayesian inference. 

If the observer really does have prior information about x £ which can be described by a 
probability distribution p x on the model space X , then he has much valuable information about 
x £ not contained in (1.1), and of course he is entitled to use this information in any inversion of 
the data. If his p x is objective, then the evidence for it will be objective, and available before the 
data are obtained. Then he should have no difficulty convincing colleagues to accept the conclu- 
sions he draws from the data. However, if his p x is a subjective prior personal probability distri- 
bution, he may have trouble defending it, or accepting it himself, when he realizes how much 
more he is assuming a priori about x £ than is contained in (1.1). 
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The plan of this paper is to formulate the linear inverse problem with great generality, so as 
to make clear why prior information is almost always essential, and so as to exhibit plainly just 
what prior information goes into the inversion. The substantive content of the paper is the com- 
parison of the prior information about x £ contained in hard quadratic bounds (1.1) and probabil- 
ity distributions px on the model space X (soft bounds). 

2. The need for prior information 

Surely data arc preferable to opinions. If we have enough good data, why can we not dispense 
with prior opinions? The reason is that our model spaces are usually infinite-dimensional, and we 
never have more than finitely many data (Backus and Gilbert, 1967). For example, suppose we 
want to use surface and satellite measurements of the geomagnetic field B to estimate the radial 
component B r at the core-mantle boundary (CMB). The apparently infinite data set from the 
satellite track can be Fourier analyzed, and only finitely many Fourier components will be above 
the noise. Hence there will be only finitely many data. The model space X can be parametrized 
by the Schmidt semi-normalalized gauss coefficients g/" at the CMB (/ is degree, and m is longi- 
tudinal order, so -l <m </). Clearly dimX = <». 

It is sometimes supposed that by studying the resolution of the data we can remove the 
difficulty of having only finitely many equations for infinitely many unknowns. In the example 
of the geomagnetic field, Gubbins and Bloxham (1985) find that the surface and satellite data do 
not resolve gauss coefficients at the CMB above degree 20. Backus (1987) shows that no surface 
and satellite data with a 1 nT error of measurement can resolve gauss coefficients at the CMB 
with degree / £ 32 unless the ohmic heating rate in the core exceeds the total geothermal flow at 
the earth’s surface. To fit geomagnetic data whose error of measurement is at least 1 nT, we need 
never use a model space X at the CMB whose dimension exceeds 1(1+2 ) with / =31. That is, 
dim X < 1023. The number of satellite data from MAGSAT exceeds this value by at least a factor 
of 10 (Langel, Estes and Mead, 1982). 

In fact, however, arguments based on resolution cannot cut down the model space X to 
finite dimensionality, because usually the predictions we want to make about the correct model 
\ E involve components of x £ not resolved by the data. 
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The present section formulates the inverse problem in a way which makes dear when prior 
information is needed to invert the data. We consider only data and predictions which depend 
linearly on the model. An approximate discussion of the nonlinear problem appears, for example, 
in Backus and Gilbert (1968). For the sake of generality, we do not introduce a topology on X, 
the infinite-dimensional teal linear space of models x. In particular, we do not assume that X is a 
Hilbert space. 

Our conventions of notation ate as follows: R is the real line. If Y and Z are sets, y e Y 
means thaty is a member of Y, and Y cZ means that Y is a subset of Z. If a function / assigns 
to each y in Y a unique value z =f (y) in Z, then we write / : Y — >Z. This symbol can also be 
real as a substantive, "the function / which maps Y into Z." 

A vector or model is simply a member of the linear space X . A dual vector or dual model is 
a linear function / : X R , i.e., a linear functional on X . Linearity of / means, of course, that if 

b *, .... b n e R and X|, .... x„ e X then 

f(b J x j )=b'f(x j ). (2.1) 

Here we use the Einstein summation convention: if an index appears once as a subscript and once 
as a superscript in a single term or a product, the summation over all its possible values is under- 
stood. The set of all dual vectors is written X. If a h ...,a m e R and / 1 ,...,/ m e X, then the 

function a,f l : X — ► R is defined by requiring that for each vector x in X 

(o l / , )(x) = a l [/‘(x)]. (2.2) 

From (2. 1) it is easy to verify that a j l e X , so X is a real linear space. 

By R n we will mean the real linear space of 1 xn matrices y = (y 1 ,...,y'*). In addition to 
the model space X two other real linear spaces enter the inverse problem: Y -R d and Z =R P . 
Here Y is the data space and Z is the prediction space. Finally, we know two linear functions, 
F :X -*Y and G :X -*Z. Our inverse problem is summarized as follows: if x E is the model in 
X which best represents the real earth, then 

y = F (x E ) + S R y + 5xy (2.3a) 

z = G(x £ ) + ^z. (2.3b) 

Here we have collected d data about the earth, real numbers y *, ...,y d which make up the data 

vector y =(y \ ...,y d ) in T. We would like to use y to predict p other data about the earth, real 
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numbers z *, .... z p which make up the prediction vector z =(z *, .... z p ) in Z. Some of the z* may 
never be directly observable, but we would like to estimate their values. If there were no errors, y 
would be F(x e ) and z would be G(x E ). There is a random error S R y‘ in each of the observed 
datay', and S R y=(S R y l , ...,6 R y d ) is the random errorvector. The failure of the models in X to 
include all relevant features of the earth produces systematic errors Sx y in y and Sx z in z. It is 
crucial in the arguments to follow that dim Y and dim Z are finite. 

Of course we do not know the errors d R y, 5*y and Sx z, but we must know something about 
them or the inverse problem is hopeless. We assume that we know a hard quadratic bound like 
(1.1) on each of the systematic error vectors, Sx y and Sx z. We also assume that we know the 
probability distribution p R of the random error vector S R y in the data space T. Therefore given 
any f :Y -*R, we can calculate the expected value of /( y), 

(f(y)) = jdp R (y)f(y). (2.4) 

Y 

In particular, we can calculate ( S R y)-((S R y *), ...,(S R y d )), and redefine y as y-(S R y), so we can 
assume that 

(S R y) = 0 (2.5) 

In the geomagnetic example of downward continuation of surface and satellite data to the 
core-mantle boundary (CMB) we will take for the model space X the space of all magnetic fields 
B defined above the CMB, irrotational and solenoidal there, and vanishing at infinity. Obviously 
dimX =°o here. The data y \ ...,y d are Cartesian components of B at finitely many sites on and 
above the surface of the earth. The quantities z z p to be estimated might be p gauss 
coefficients of B at the CMB, or the values of the radial component B r at p sites on the CMB, or 
the magnetic flux through p null-flux curves on the CMB (Backus, 1968; Gubbins and Bloxham, 
1985). In the null-flux example, G :X — >Z in (2.3b) is nonlinear and must be linearized (Gub- 
bins and Bloxham, loc cit), and our linear calculations will be only approximately correct 

In this geomagnetic example, the total error vector ^y + ^y includes instrument errors, 
site location errors, stray fields, and contributions to B from crustal magnetization or the electric 
currents, in the mantle, ionosphere and magnetosphere. The errors about which we have statistical 
information are lumped in S R y, and the remaining errors constitute 5*y. It is our thesis that in 
spaces of high dimension, statistical information is stronger than a quadratic bound, so we claim 
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to know more about S R y than about fay. The formulation of the inverse problem can be 
"upgraded" by promoting part of fay to S R y or part of either fay or S R y to X. For example, if 
nothing is known about the magnetic field of the crust at satellite altitudes except a bound on its 
intensity, it belongs in fay. If we are willing to treat the crustal field as a two-dimensional ran- 
dom process on the surface of a sphere, and if we know its statistics, then it belongs in S R y. 
Finally, we can include the crustal field in X by expanding the model space so that each model 
includes a jump in B r at the surface of the earth (Backus, 1986). If we want to include the crustal 
field in X , so as to model surface as well as satellite observations of B, Langel et al. (1982) point 
out that we can make station corrections obtained from other satellite data at other times. 

The analysis of the linear inverse problem hinges on the following observation: if/ 1 , ...,/" 
are linearly independent dual vectors (linear functionals) in X then there are vectors x Jt .... x„ in 
X such that for 1 < i , j <n 

f i (xj) = ff j (2.6) 

where & : is the Kronecker delta, 1 when i = j and 0 otherwise. We prove this fact by induction 
on n in imitation of Gram-Schmidt orthogonalization. If n = 1, linear independence means sim- 
ply f 1 *0. Then there is an xq in X such that / , (x 0 )=a *0. We take Xj =a“ 1 x 0 . Now suppose 
we know (2.6) for n , and we want to prove it for n + 1. We are given linearly independent dual 
vectors /*, ...,/“ ,/" +1 in X. By induction we can assume that we have found vectors £ 
inX such that for l<i,y <n 

/%) = £,. (2.7a) 

Then for every vector x in X we can define a vector x 1 by writing 

x L =x-/‘(x^ 4 , (2.7b) 

the sum being over 1 < / < n . Suppose that for every x in X 

/" + V) = 0. (2.8a) 

Then, from (2.7b) and the linearity of/ n+1 , 

/" +1 (x)=/‘'(x)/" +1 (^). (2.8b) 

Definition (2.2) permits us to write (2.8b) in the form 

7" +, (x) = [/" +, €i)/*' ) (x). (2.8c) 
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Since (2.8c) holds for every x in X, the functions /" +1 :X —>R and [f n+l (£i)f‘]:X ->R must 
be the same. Thus 



/‘(x w+ i) = 0; 

and then for 1 <,i<j <,n equations (2.7a), (2.9b) and (2.9c) imply 

/*(*,■ ) = *j- 

It follows that f l {xj)-8 j for 1 <i,j<n+ 1, and the induction is complete. 

The foregoing proof makes clear that the vectors x t , ...,x B in (2.6) are not uniquely deter- 
mined by p /" unless n =dimX. However, x 1 ,...,x„ are linearly independent, for if 

a l , .... a" e R and a 7 Xy =0 then (2.6) implies a 7 8 j =0, so a 1 = ... -a n =0. The list of dual vec- 
tors /*, ...,/" and the list of vectors x lt .... x„ are said to be dual to one another. 

Now we return to the inverse problem (2.3) and its need for prior information about x E . We 
write the vectors F (x) and G (x) from (2.3) in the forms 

F(x) = (F\x),...,F d (x)) (2.10a) 

G (x) = (G ^x), .... G p (x)) . (2.10b) 

Here F‘ :X ->/? and G J :X —>R are linear functions. They are dual vectors, members of the 

dual space X. For simplicity, we assume there are no errors in the data, that F l ,...,F d are 

linearly independent, and that p =1. If G 1 is a linear combination of F x , .... F d , then clearly 
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G ^Xjp), the prediction, can be calculated directly from F(x £ ), the data vector (Backus and Gil- 
bert, 1967, consider this case at length). The uniqueness problem and the need for prior informa- 
tion arise when G 1 is not a liner combination of F 1 , .... F d . In this case, F 1 , ..., F d , G 1 are 

linearly independent Then we choose model vectors Xj x d ,^ l dual to F 1 , ...,F d , G 1 . Thus 

for l£i, j £d 

F i (x j ) = Vj . (2.11a) 

Also, for l^t <,d 
% 

F i ^ 1 ) = G 1 (x,) = 0, (2.11b) 

and finally 

i) = 1- (2.11c) 

Now let b be any real number, and define 

x = x £ +&5 1 . (2.12) 

From (2.11), clearly 

F i (x) = F'(x £ )=y , ‘ (2.13) 

and 

G ‘(x) = G *(x £ ) + b = z + b . (2.14) 

From (2.13), x satisfies the data just as well as does x £ ! From (2.14), G '(x) differs from z by b , 

which is arbitrary. Thus if all we know about x E is that it satisfies the data, we can put no limits 

whatever on the possible values of the prediction z =G 1 (x £ ). 

The remedy for this difficulty is apparent To make z=G (x) very different from G (x £ ), we 
must make b very large in (2.12). But if b is too large, the model x=x £ +b£ will be rejected as 
physically unreasonable. It is the careful examination of what "physically unreasonable" means 
which introduces our prior information or beliefs about x £ into the inverse problem (2.3). To 
enable the data y to restrict the prediction vector z, we must have some prior information about 
the correct earth model x £ . This prior information must confine x £ to a tractable subset of X, at 
least with high probability (unless, of course, the prior information is simply the value of z). The 
manner in which hard or soft prior bounds on x £ (inequalities or probability distributions for x £ ) 
reduce the ambiguity in linear inverse problems is discussed elsewhere. See Heustis & Parker 
(1977) for the use of linear programming with linear inequalities, or Backus (1970a) for hard qua- 
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dratic inference with quadratic inequalities. For objective soft bounds (objectively defensible 
probability distributions), Franklin (1970) and Jackson (1979) discuss stochastic inversion (SI). 
For subjective soft bounds (personal probability distributions) Backus (1970b) and Tarantola and 
Valette (1982) discuss Bayesian inference (BI). Backus (1987) gives brief reviews of both SI and 
BI. 

The foregoing conclusions are purely algebraic. They do not require a topology on X, 
much less a norm or an inner product The.belief that a certain topology on X is relevant to the 
real earth is itself a prior belief which can be used in the inverse problem. In the subsequent sec- 
tions, we will see how quadratic hard bounds and probability distributions both lead to physically 
natural inner products on X . Usually, this prior information is the only natural source of such an 
inner product on X . 

3. Information lost and gained in softening a hard bound 

In the problem of geomagnetic downward continuation, let g” be the Schmidt semi-normalized 
gauss coefficient of degree / and longitudinal order m at the core-mantle boundary (CMB), meas- 
ured in nanoTeslas. Our belief that the energy of the geomagnetic field B cannot have a rest mass 
greater than that of the earth leads us to accept 

oo / 

£ (/+1X2/+1)" 1 £ lg/"l 2 <2x lO^nT 2 (3.1) 

/=1 m =>-/ 

(Backus, 1987). Our belief that the total rate of heat flow out of the earth’s surface is larger than 
Gubbins’ (1975) expression for the minimum rate of ohmic heating in the core leads to 

oo / 

£ r 1 (/+l)(2/+l)(2/+3) £ lg/"l 2 < 3 x 10 ,7 nT 2 (3.2) 

if we think that the electrical conductivity in the core is everywhere less than 3 x 10 5 mho/meter 
(Backus, 1987). Both prior beliefs, (3.1) and (3.2), are examples of quadratic bounds (1.1). Both 
beliefs are imprecise. Most geophysicists would confidently reduce the right side of (3.1) by 
several orders of magnitude. Geophysicists who believe that at least two-thirds of the surface 
heat flow comes from radioactivity in the crust and mantle would be willing to reduce 3 x 10 17 to 
10 17 in (3-2). But others might want to replace 3 x 10 17 by 10 18 in (3.2) because heat pulses from 
a chaotic core dynamo could produce unsteady heat flow, or because some of the ohmic heat 
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might be produced in the hot source of die core teat engine and recycled into magnetic energy 
ratter than lost to the mantle (Backus, 197S). Prior information, although essential to inversion, 
is almost always imprecise. One way to deal with this imprecision is to verify that the conclu- 
sions drawn from an inversion are not sensitive to the bounds on the right of (3.1) or (3.2), as 
long as those bounds remain in a physically defensible range. Another way is to try to represent 
the imprecision by replacing (1.1) by a probability distribution p x on the model space X (Backus, 
1970b; Jackson, 1979; Gubbins, 1983). At first sight, this appears reasonable. We show in the 
present section that, on the contrary, "softening" (1.1) to a probability distribution adds consider- 
able information about x E which is not implied by (1.1). 

The natural way to replace (3.2) with a probability distribution p x is to assume that the g" 
are independent gaussian random variables with means 0 and variances 
3x 10 17 /(/+l) -1 ( 2 /+l) -1 ( 2 /+ 3 ) -1 nT 2 . This p x raises some questions which we want to discuss 
in general, so we consider the general case (1.1). We are given a model space X and a positive 
definite quadratic from Q x on X, and we believe that the real earth satisfies (1 .1). 

Then we can introduce on X the dot product X! • x 2 , defined by 

*r*2 = Gx(*i.*2) (3.3a) 

and we can define the length ||x|| of the model x as 

IWI =(xx) H . (3.3b) 

If X is not complete in the norm (3.3b), we can complete it, and when we do, X becomes a Hil- 
bert space with inner product (3.3a) (Halmos, 1951). In X we can always find an orthonormal 
basis. We will assume that this basis is denumerable, as is the case in the geomagnetic example 
and all others where the models are well-behaved scalar or vector fields on finite dimensional 
domains. Thus there is an infinite sequence of vectors kj, k 2 , — in X such that 

*; (3.4a) 

and if x is any vector in X then 


oo 

1=1 

where 

x, = k,- • x . 


(3.4b) 


(3.4c) 
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The convergence in (3.4b) is with respect to the norm (3.3b). Now the condition (1.1) for physi- 
cal acceptability of the model x can be written simply as 

llxll £ 1 (3.5a) 

or 

oo 

£ x? <; 1 . (3.5b) 

i=t 

We want to try to "soften" (3.5) to a probability distribution p x which injects no new infor- 
mation about x not already contained in (3.5). A plausible softening procedure begins with the 
observation that if (3.5) is really all we know, then all we are entitled to claim about any one 
component x,- is 

-1 x ( < 1 . (3.6) 

We can inject imprecision into (3.6) by regarding x t as a random variable with a one-dimensional 
probability distribution p t whose mean and variance are 0 and 1. If, as (3.5b) indicates, we really 
know nothing to distinguish the separate x, ’s, then all the p t should be the same as p v The 
x lt x 2 ,... should be identically distributed. Furthermore, if px distinguishes between x, and -x, 
then px includes prior information not present in (3.5). Therefore, using p x as in (2.4) to calcu- 
late expected values, we should obtain (x, x y ) = 0 when i * / . Hence 

<*/> = 0 (3.7a) 

(x i Xj) = S ij . (3.7b) 

At this point we encounter a well-known difficulty. From (3.7b) follows 

(E*/ 2 > = °°- (3.7c) 

M 

As a probabalistic analogue of (3.5b), (3.7c) is disappointing. Softening (3.5) to a probability dis- 
tribution appears to have destroyed some information. This accounts for the choice of words: p x 
is a "softened" version of (3.5), being fuzzier than (3.5). 

Gubbins (1983) and Backus (1987) discuss a possible remedy for (3.7c). We can introduce 
convergence factors Ki,k 2 , ... such that 

0 < jc, < 1 (3.8a) 

and 
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£r?-l. (3.8b) 

Then we can replace (3.7b) by 

(xiX j )=K?5 ij . (3.8c) 

Now (3.7c) is replaced by 

<Z*, 2 >=1, (3.9a) 

;=i 

as we might hope from (3.5). However, (3.8c) and (3.7a) are the softened version of the opinion 

-fr, £ Xi Kj . (3.9b) 

Since (3.8b) requires v, ->0 as i -»«>, clearly (3.8c) goes well beyond (3.6) or even (3.5b) as a 
statement of what we claim to believe about x E . The use of convergence factors commits us to 
accepting much prior information not implied by (3.5). If we really believe this extra prior infor- 
mation, we are certainly entitled to use it in inverting the data. Unfortunately, convergence fac- 
tors are usually introduced ad hoc, simply to avoid (3.7c) and with no other evidence to support 
them. Conclusions drawn from such prior information are as speculative as the information itself. 
We are trying to find all physically reasonable models which fit the data with statistically accept- 
able accuracy, so we prefer to invoke only prior information for which we have good evidence. If 
(3.5) really represents all the prior information we are willing to accept, then we cannot introduce 
convergence factors. We must accept (3.7) as properties of any p x which softens (3.5) and adds 
no new information. (For a less puritanical view, see Backus, 1987.) 

To construct p x , we must arrange that all the one-dimensional marginal distributions. 
Pi,p 2 , .... are the same, and have mean 0 and variance 1. Thus, choosing p l determines all the 
Pj. It does not determine p x . To find p x from p lt we note that (3.7b) suggests (but does not 
require) that xi,x 2 ,Xj,... be assumed independent. If dimX =n <<*>, then the density function 

for p x in the case of independence becomes simply the product of the densities of X\ x n . If 

dimX = <*», then p x is the Kolmogorov distribution whose projections onto finite-dimensional 
subspaces of X have the product densities (Kolmogorov, 1950; see also Backus, 1987). 

Now suppose we make the very mild assumption that the one-dimensional distribution p t 
for the separate x,-’s has a fourth moment, which we write as K+l. Iff^ is gaussian, this is cer- 
tainly true, and K=2. But if (x ; 4 ) exists, then x} ,x 2 .... are identically distributed independent 
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random variables with mean (x 2 )= 1 and variance (x 4 )- 1 =K. For each integer n , define the 
random variable 

S n (x) = n~ l £x,- 2 . (3.10) 

<=i 

Then S K has mean 1 and variance Kin. When n » 1, die central limit theorem (Kendall and 
Stuart, 1977, p. 206) says that S„(x) is approximately gaussian. The probability that a one- 
dimensional gaussian variable is more than three standard deviations from its mean is slightly 
less than 0.003. Thus, with a probability slightly more than 0.997, 

IS B (x)-ll <3(AT/n) V4 . (3.11) 

In short, if n is very large, with high probability we can infer from the softened version of (3.5) a 
very accurate estimate of the value of S n (x E ) for the correct earth model x E . We obtain this esti- 
mate without any data. It was certainly not present in (3.5). It represents new "information" gen- 
erated in the process of softening a hard quadratic bound to a probability distribution. The same 
argument applies, of course, to the data space Y in (2.3). If its dimension is large, we know more 
about the random error 5 R y than about the systematic error <5*y, because the former is con- 
strained by a probability distribution p R on Y, the latter only by a quadratic inequality. 

Softening (3.5) to a probability distribution Px in the manner just described generates still 
more information about x E . Let px(U) denote the probability that x E is a member of the subset 
U of X . For any positive a and any integer n , let X n (a) be the set of models x for which 

nS„(x)Za (3.12a) 

where S n is defined by (3.10). Let XJfx) be the set of models x such that 

«o 

£x,- 2 <a. (3.12b) 

i=i 

Let X„ be the set of models x such that 

oo 

£*i 2 <~- (3.12c) 

i=l 

Since nS n (x) is approximately gaussian With mean n and standard deviation (Kn )' A , therefore 

lim p x [X n (a)]=0. (3.13a) 

n — >o° 
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But for eacha, XJfx) c.X* + ,(a) cX„ (a). Therefore (Halmos, 1950, p. 38) 

/*[*-(«)] = 0. (3.13b) 

Now for every integer n, X.(fl)cX.(/t+l)cX„, while if xe then xeX„ for some n. 

Therefore (Halmos, 1950, p. 38) 

p x (XJ = 0. (3.13c) 

In short, if we soften (3.5) to a probability distribution in the obvious way, far from losing infor- 
mation, as seems to be suggested by (3.7c), we are led to espouse the belief that with probability 
1. ||x £ || =<*>. Not only does x E violate (3.5); it is altogether outside the model space X. The 
softening process leads a geomagnetician who initially accepts (3.2) to convert to the belief that 
the ohmic heat production rate in the core is infinite. 

Technically, what has happened is that the Kolmogorov distribution p x obtained by soften- 
ing (3.5) has as its natural domain the set of all sequences (xi,x 2 , ...), square summable or not in 
the sense of (3.5b). We have just deduced that p x assigns probability 0 to the set of all those 
sequences which are square summable. 

The argument leading to (3.13c) depends crucially on the assumptions that x t ,x 2 , ... have 
the same one-dimensional distribution p Jf and that x t ,x 2 , ... are independent. The former 
assumption simply says that our prior information (3.5) does not distinguish between x, and x y . 
This seems a fair view of (3.5). However, the latter assumption, independence, is not obviously 
contained in (3.5), and it leads to disaster. 

Equation (3.7b) does not really entitle us to assume that x,- and x ; are independent, but only 
that they are uncorrelated. If they are gaussian, lack of correlation implies independence, so if we 
are to save the idea of bound softening, we must accept model parameters x 1 ,x 2 ,— in (3.4b) 
which are neither gaussian nor independent We explore this question in the next section. 

4. Bound softening with dependent, nongaussian model parameters 

The new information in (3.11) arises from the fact that whenx!,x 2 , ... are identically distributed, 
independent random variables with mean zero, and 

r n 2 =ix,- 2 (4.1a) 

i= l 
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then the standard deviation of r„ 2 , 

otTb-VriWi?. f. (4.1b) 

grows more slowly with n than the expected value, (r* 2 ). Thus the relative error in r 2 , 
<7 (r?)/(r? >, tends to zero as n becomes laige. We seek a probability distribution p x on X which 
avoids this difficulty and represents a fuzzy version of (3.5). Geady we cannot demand that p x 
be gaussian or that Xi,x 2> ... be independent We want to avoid any p x which introduces new 
information not contained in (3.5). One obvious property of (3.5) is its failure to distinguish 
among x\,x 2 , — ; in fact (3.5) is unchanged by any rotation of the axes k,,* 2 . — in X. It seems 
reasonable to ask that p x have this property. We will make only the slightly weaker demand that 
p x be unaltered when any finite number of axes ••• are rotated among themselves, leaving 
the others fixed. We call such a probability distribution on X "isotropic." 

We denote by X n the subspace of X consisting of vectors of the form 

r„ = Z *«*< (4.2a) 

i=i 

and by p n the marginal distribution of p x on X„ . If each p n has a density function, /„ , we will 
call p x "regular." Backus (1988) shows that every isotropic p x is the weighted sum of a regular 
isotropic p x and a p x which concentrates all its probability at 0. For simplicity, here we consier 

only a regular p x . Then the isotropy of p x requires that /„ depend on x x x n only through the 

r 2 of (4.1a). Thus 

dPn ( r n )=fn(* 1+ - 1 - dx n . (4.2b) 

We want to calculate the mean, (r„ 2 ), and the standard deviation, cr(r 2 ), of r„ 2 for an isotropic dis- 
tribution p x . The spherical symmetry of p x implies 


0c i x j ) = (x 2 )8 ij 

(4.3a) 

(x,- 4 > = Ct 4 > 

(4.3b) 

Oc,V> = (*y> if* ■ 

(4.3c) 

Here we have written x forX] and y for x 2 . Furthermore, clearly 


oo oo 

(x 2 )= j dx J dy x 2 f 2 (x 2 +y 2 ) 

— oo — oo 


oo oo 

(x 4 )= j dx j dy X 4 f 2 (x 2 +y 2 ) 
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oo oo 

<*V> = J dx j dy x 2 y 2 f 2 (x 2 +y 2 ). 


If we perform these integrations in plane polar coordinates, with x ■■ 

=^ w cos d, y =£ H sin0, we 

obtain 

oo 

<* 2 > = <?r/2)J^/ 2 <S) 
0 

(4.4a) 


oo 

<x 2 y 2 ) = 0r/8)Jrf^ 2 / 2 (g) 
o 

(4.4b) 

From (4.3) 

(x 4 ) = 3(x 2 y 2 > 

(4.4c) 

and 

W) = n<? 2 ), 

(4.5a) 

so, from (4.4.c) 

(r„ 4 ) = n(* 4 ) + n(n- 1X*V>, 


Therefore 

(r B 4 > = n(n+2Xxy). 

' (4.5b) 

where 

0(r 2 )/ {r 2 ) = K — 1 +2x‘n -1 

(4.6a) 


K , = (x 2 y 2 )(x 2 ) -2 . 

(4.6b) 

The difficulty (3.11) arises because when x\,x 2 ,... are independent, k- 

= 1. Can we choose p x so 

K>n 



Evidently we must learn to construct isotropic probability distributions. The construction is 

based on the observation that since p n is the marginal distribution of p x on X n , it is also the mar- 

ginal distribution of p n+I on X n . Therefore one of the Kolmogorov consistency conditions (Kol- 

mogorov, 1950) is 

OO 

fn@)= ! dz f H+l Qs\z 2 ). 

(4.7) 

2 

If we define g =51 

2 

and choose the new variable of integration T] =5 1 + z 

2 , then (4.7) becomes 


OO 

(4.8) 
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Following Abel (Tricorai, 1957, p. 39) we solve (4,3a) for /* +1 by appealing to the identity 

|^($-<)- V4 (t7^r V4 =^. (4.9) 

We multiply (4.8) by <g-£) -v \ integrate over £ from £ to «>, reverse the order of the integrals on 
the right, and use (4.9) to obtain 

J dZ<g-Cr*f n <g) =k j di]f n+x <j ]) . 

Differentiating with respect to £ and relabeling variables gives 

•o 

| dTim-sr^f^). (4.io) 

Equations (4.8) and (4.10) make very clear that we cannot choose f\,fz,fi , ... independently. If 
we choose one of them, all the others are determined. If we choose /* for some particular n , 
how much freedom do we have in this choice? 

If we want to choose one of the marginal densities /„ and construct all the others from (4.8) 
and (4.10), two limitations on our choice are that 

/„£)* 0 (4.11a) 

and that 

J dx x ...dx n f n (x i + ... +x?) = 1 . (4.1 lb) 

x m 

Carrying out the integral in n dimensional spherical polar coordinates gives 

oo 

it na T{nl2T l J dtt na - l f H (g) = 1 • (4.11c) 

o 

A further limitation on our choice of/* is that any f m constructed from /* by (4.8) or (4.10) 
should also satisfy (4.11). Equations (4.11b,c) cause no trouble. The constructions (4.8) and 
(4.10) are equivalent to (4.7), which guarantees (4.11b) for all m. There remains (4.11a). If 
m <n, (4.11a) for /* and (4.8) imply /„,(£)> 0 for all £ . To see whether (4.10) ensures /*,(£)> 0 
for all £ when m>n requires some care. 

We begin by appealing to Abel’s identity (4.9) to simplify (4.1). We iterate (4.8) once, to 
express /„ in terms of f n+2 . Then we reverse the order of the double integral on the right and use 
(4.9) to obtain 
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Differentiating with respect to £ gives 


(4.12a) 


/, + 2(5) = -*-V«<g)- (4.12b) 

This situation is familiar from seismic travel time inversion. The square of the operator applied 
to /„+i in (4.8) is it times the integration operator, and the square of the operator applied to /, in 
(4.10) is -rr times the differentiation operator. Given/,, we find /, +1 from (4.10), and then we 
can find all other f m with m £ n by means of (4. 12b). 


It is clear from (4.12a) that every marginal density function /, of an isotropic probability 
distribution p x has at least one derivative, 3{/*($). But then induction on (4.12a) shows that 
each /„ (£) is infinitely differentiable for all £ £ 0. If we are trying to construct an isotropic p x by 
choosing one of its marginal densities /, , we must be careful to choose /„ to be infinitely dif- 
ferentiable. 

Furthermore, the derivatives of /„ cannot be arbitrary. If we want /„ +2 ^ (£) £ 0 for all £ £ 0 
and all integers q £ 0, (4.12b) shows that we must choose an /„ such that 


i-tyfn&Z 0 (4.13a) 

for all £ £0 and all integers <7^0. Moreover,/, must be such that when /, +1 is constructed from 
it via (4.10), we have 


+ 1€)*0. (4.13b) 

In fact, (4.10) and (4.13a) imply (4.13b). To see this we observe that if / is twice continuously 
differentiable and dies away at infinity rapidly enough to permit the integrals to converge, then 


| dn(n-Z) V 0?)= | <*»?(» H) 

To prove this observation, integrate the integral on the left by parts, and differentiate under the 
integral sign. Applying this observation q + 1 times allows us to infer from (4.10) that 

oo 

(-^)*/„ + t<S) =* -1 1 • 

Thus if (4. 1 3a) is true for all q , so is (4. 1 3b). 
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We conclude that f H (g) is the marginal n -dimensional density of an isotropic probability 
distribution p x if and only if /„ is continuously differentiable on£ £0 and satisfies (4.13a) for all 
£ £0 and all integers q ^0, and also satisfies (4.11c). The choice /„(£) = (2?r )~" a e ~^ a meets 
these conditions and the resulting p x is die gaussian considered in section 3. Another isotropic 
probability distribution can be constructed from 

/ 2 <S) =ff _, (v+l)a v+1 (a-»^) _ ^’ +2) (4.14) 

where a is any positive constant and v is any constant larger than —1. 

Now that we have a nongaussian, dependent isotropic probability distribution, perhaps we 
can escape from (3.1 1). It is easy to compute from (4.4) that if v > 1 the p x constructed from 
(4.14) leads to 

(x 2 )=a/( Tv) 
and 

<*y>=a 2 /[4v(v-l)], 

so in (4.6) 

KT=l + (V-ir 1 . 

Therefore <T(r 2 )/(r 2 ), the relative error in r„ 2 , does not shrink to zero for large n if we adopt the 
p x constructed from (4.14). If we do not want to add information like (3.9b) to (3.5), we should 
choose (x 2 )= 1, so in (4.14) we want 

a = 2v . 

Having escaped from (3. 1 1), can we also escape from (3.13)? Unfortunately, the answer is 
no. Every isotropic probability distribution p x will result in (3.13) or will assign probability 1 to 
the origin, 0=(0,0, ...). The author’s proof of this fact was complicated. Gary Egbert has found 
a simpler proof of a more general result Let X be as in section 3. For any integer i >0 and any 
real c and d , the set of all x=(x { ,x 2 , ...) inX for which c <*, <d is called a "slab." Let p x be a 
probability measure on X which is able to assign probabilities to all slabs (i.e., all slabs are 
measurable; see Haimos, 1950). We call p x "symmetric" if it is unchanged when any two coordi- 
nates jc, and Xj are interchanged. Clearly every isotropic probability measure on X is symmetric. 
Egbert proves that if p x is symmetric then it concentrates all probability at the origin. 
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Egbert’s proof begins by defining 


<r,(a) 2 = / dp x (x)x i 7 
*-(«) 


(4.15) 


where i is any positive integer, a is any positive real number, and XJa) is as in (3.12b). By die 
Lebesgue monotone convergence theorem (Halmos, 1950), 


2<x l (a) 2 = J 4 Px(*) 2> 2 

1 XJpi) l*-=l . 

oo 

But E x ,- 2 <,a in XJpc), so by (4.16) 

1=1 


(4.16) 


Zo-AafzapxVCJsx)]^. 

i- 1 


(4.17) 


Since p x is symmetric, cr ( - (a) 2 is the same for all i, so (4.17) implies or,- (a) 2 =0 for all i. There- 


fore (4.16) implies 


J dPxW £*< 2 =0. 

x.(«) L‘=i J 

For any reale anda with 0<e <a, define a) to be the set of all x in X for which 


(4.18) 


e< X 

i=i 

Then clearly 


ep x [X„,(e,a)]< j 4p x (x) S> 2 < J 4>x(*) Z*i 2 • 

XJpfi) L«=l J X„<a) U=1 . 


Therefore, by (4.18), 


p x [X„fc,a)]=0 


(4.19) 


for every real e and a with 0<e<a. Now let X \ {0} denote X with 0 removed. If x is in 
X \ {0}, then 0<||x|| <<», so there is a positive integer n such that x is in X„(n~\n). Moreover, 
ifmen \hmXj(m~ t jn)GX 0 .(n~ x ,n). It follows (Halmos, 1950, p. 38) that 


p x (X\{0})=limp x [X oo («" , ,n)]. 


Therefore, by (4.19), 


P*(X\{0}) = 0. 


(4.20) 
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Equation (4.20) shows that not only every isotropic probability distribution on X but even 
every symmetric probability distribution concentrates all probability at the origin. If we do not 
believe that x=0 with probability 1, then the situation is as if p x were gaussian. The natural 
domain of p x is the space R~ of all infinite sequences (x lt x 2 , ...), and p x assigns probability 0 to 
the whole subspace of square-summable sequences. Isotropic probability distributions cannot 
adequately represent the hard bound (3.5) even when they avoid the trap of (3.1 1). 


The argument leading to (4.20) clearly depends on the fact that X is infinite-dimensional. 
Any particular inverse problem can be studied on a finite-dimensional model space, constructed 
via (2.6) from a maximal linearly independent subset of the F 1 , ..., F d , G 1 , .... G p in (2.10). 
Perhaps we should permit dimX =N <«> and try to soften (3.5) wife an isotropic probability dis- 
tribution p x on X . The probability densities /„ of die marginal distributions on the subspaces X n 
defined by (4.2a) will be related as in (4.8) and (4.10), but the sequence /,,/ 2 f N will ter- 

minate at f N , and from (4.12a) we will be able to claim about /„ only that it has (N-n)/2 or 
(N-n -l)/2 derivatives. The obvious candidate for a p x which softens (3.5) is the one whose pro- 
bability density on X =X N is constant when ||x|| £ 1 and 0 when ||x|| > 1. The constant can be 
evaluated from (4. 1 lc). Then, from (4.8) and (4.12), if 1 <n 


In particular, 


Then, from (4.4), 


so 


f g u Ml! 

" 5 {Nl2-nl2)\n Hl1 
/ 2 ^) = (2rr)- 1 /V(K/ //2 - 1 


<x 2 ) = (i N+2 r l 
(xY) = (N+2)- l (N+4y l , 


( r n)~ n (N +2)~ l 


and 


Y) 


2«r 2 r 1 -l) 


N +4 




For large N , this distribution does not even avoid the trap (3. 11). 


(4.21a) 


(4.21b) 
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It is possible to avoid (3.11) by using (4.14) but terminating die sequence / j./j, ... atf N . 
However, if N is very large, (4.2b) and (4.12b) show that p x [X N ( 1 )] will become very small, and 
this is the probability of the event (3.5), which our prior beliefs led us to feel sure about. Such a 
difficulty will arise with any choice of /■&) which can be continued via (4.10) to arbitrarily high 
dimension N . If we choose an / 2 (§) which cannot be continued up beyond a particular X N , how 
do we justify our choice of//? Ana priori restriction on the dimension of the model space is not 
the sort of prior information that will be very convincing to modem workers. On the other hand, 
if f 2 is chosen so that (4.10) terminates atN=d+p, then the personal probability distribution 
we accept on X before obtaining the data depends on how many data we are about to obtain. 

5. Information lost in hardening a soft bound 

To compare further the information content of quadratic inequalities and probability distributions, 
we will examine the question of trying to represent a probability distribution by a quadratic ine- 
quality. Since our starting point is now a probability distribution p x on a linear space X , (3. 13c) 
leads us to assume that X is finite dimensional, with// =dimX. The practical application of this 
section is thus to a discussion of the random errors S R y in the data space Y, and p x is the p R of 
(2.4). However, to facilitate comparison with sections 3 and 4, and to use their notation, we con- 
tinue to write X and p x for the linear space and the probability distribution on it 

The importance of not immediately identifying X with a data space of column vectors is to 
force us to recognize that X is an abstract linear space without any structure or geometry except 
what can be built from p x . Our first task is to construct on X from p x a positive definite qua- 
dratic form Q x . To do so, we recall the dual space X, consisting of all linear functionals 
/ : X R . Since X is a linear space, it has a dual space X. If x is any fixed vector in X , we can 
view x as a member x of X . To do so. for every / in X we define 

x(0=/(x). (5.1a) 

When x is fixed, x(f) is a real number which, by (2.2), depends linearly on/. Thus x is indeed a 

~ ar 

linear functional on X, i.e., a member of X. Furthermore, since dim X <oo, every linear func- 
tional on X is of the form (5.1a) for exactly one x in X (Halmos, 1958). Thus, in the sense of 
(5.1a), 
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X =X . (5.1b) 

First we use X to define the mean value (x) of x under p x . Iff e X , then /(x) is a random 
variable on X, with an expected value as defined in (2.4). From (2.4) and (2.2) (f (x)) depends 
linearly on/ , so by (5. lb) there is a unique vector in X , which we will denote by <x), such that 

</(*)>=/«*» (5.2a) 

for every / in X . Now we can shift the origin of X to (x) so as to achieve the result 

* 

(x) = 0 . (5.2b) 

Next, we use p x to define a quadratic form Q% on X . For any / 1 and / 2 inX,/ ,(x)/ 2 (x) is 

a random variable on X , so we can define 

Q x (f i ./ 2 ) = if 1 00/ 2 (*)> • (5.3a) 

Clearly, Q x (f ,,/j) is a real number which depends linearly on each of/, and/ 2 when the other 

is fixed. Also, clearly, 

Q X (fiJi) = Qx<f2J{). • ' (5.3b) 

Finally we claim that if/ *0then 

Qt(fJ)> 0, (5.3c) 

i.e., Q% is positive definite. If / e X , then 

Qt(ff) = (f(x) 2 ). 

Thus Q x (f If Q x (fj)=0, then with probability 1 

/ (x) = 0 . (5.4) 

If / ^0 then (5.4) describes an (N-l /dimensional subspace of X where x is to be found with 
probability 1. Obviously we should replace X by that subspace. Continuing in this way by 
induction, we finally reach (5.3c) unless p x concentrates all its probability on 0, a degenerate 
case which we ignore. 

The positive definite quadratic form Q x on X defines a dot product on X , namely 

/ 1 ’ fi - Q x (f 1 ./ 2 ) = if 1 (*)/ 2 ( x » • (5.5) 

But once we have a dot product on a finite-dimensional vector space X, we can identify X with 

its dual space X (Halmos, 1958). Thus, for every fixed 0 in X there is a unique ^ in X such that 


December 28, 1987 



George E. Backus 


26 


for every / inX 

$(/)=$/■ (5.6a) 

In other words, 

X=X. (5.6b) 

Together, (5.1) and (5.6) imply 

X =X . (5.6c) 

The details of (5.6c) arc as follows. For every fixed x in X , there is a corresponding xeX; and 
vice versa. Setting $ =£ in (5.6a), we find from x a unique * in X such that for every / in X , 

*(0 = * 7 . 

By (5.1a), this means that if x is a fixed vector in X, there is a unique linear functional * in X 
such that for every / inX 

/(x)=/S. (5.6d) 

Now we can define a quadratic form Q x on X. for any x, and x 2 in. X we require simply 
that 

Gx(xi.* 2 ) = *r* 2 - (5.7a) 

From (5.6d) it is easy to verify that ft depends linearly on x, so Gx(*i. x 2 ) depends linearly on 
each of xi and x 2 when the other is fixed. Also, from (5.5), xj • X 2 =x 2 • X lt so 
(2x ( x i. * 2 ) = Qx( x 2, x i)- Finally, since is positive definite, Q x (x, x) £ 0; and if Q x (x,x)=0 
then X=0. By (5.6c) this implies x=0, so Q x is positive definite. From the positive definite qua- 
dratic form Q x on X we can define the obvious dot product, 

x 2 = Gx( x i.* 2). (5.7b) 

so 

x,x 2 = X,x 2 . (5.7c) 

Tracing through the definitions, we see that (5.7c) implies that for any fixed X! and x 2 in X , 

x r *2 = < x i(*)* 2 (x)> (5.8a) 

and hence, by (5.6c) and (5.7c), 

x, • x 2 = <(x, • x)(x 2 • x)> . (5.8b) 
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For the applications it will be useful to outline a less abstract approach to Qx than the fore- 
going. We choose any fixed basis b t b* for X. Then for each x in X there are unique real 

numbers £* £ N such that 

x=£‘b , (5.9a) 

and these numbers depend linearly on x. That is, there are N linear functionals b l :X — *R such 
that in (5.9a) 

£‘‘ =£‘(x). (5.9b) 

Thus the£‘ are random variables, with expected values defined from p x as in (2.4): (£ l )=(b‘(x)). 
It is easy to verify that the vector (x) defined by 

(x> = «‘>b (5.10) 
is the same for all bases bi,...,b w . Therefore we can shift the origin of X to achieve (5.2b). 

Next, we define the N xN matrix V whose ij entry is 

V» =<£‘Z J ). (5.11a) 

We claim that V 1 exists, and we denote its ij entry by and also, when convenient, by 

(notVjJ 1 !). Thus 

Vij={V-'yi. (5.11b) 

To prove that V -1 exists, it suffices to prove that V is positive definite, i.e., a t V li a , >0 for any 
real N -tuple (a j ,...,%)* (0, .... 0). But for any real a j, .... % , 

aiV iJ aj =((a i $ i ) 2 ). 

Therefore a t V^ a j ^0, and if =0 then the probability is 1 that x lies in the (N-l)- 

dimensional subspace of X given by 

= U;b'(x) = 0. 

If this happens, we replace X by that subspace. 

Now for any vectors x j and Xj in X , we are able to define a real number 

Qx(xi.x 2 ) = b , (x,)V i; b / (x 2 ). (5.12) 

It is an exercise in matrix algebra to verify that £?x( x i, x 2 ) has the same value, whatever basis 
bi,...,byv is used to calculate it. Obviously !2x( x i- * 2 ) depends linearly on each of x, and x 2 
when the other is fixed. Finally, since V is symmetric and positive definite, so is V" 1 . Hence 
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Q x (x 1 .* 2 > = fix(* 2 .*i). and if Gx(*.x)=Othen b‘(x)=0 for all i, so, from (5.2), x=0. Thus Q x 
is a positive-definite quadratic form constructed on X from p x , and independent of the basis used 
in the construction. Now we define a dot product on X by (5.7b). Proving (5.8b) from (5.5) is an 
exercise in matrix algebra which we omit. 

Having obtained from p x a dot product on X with the crucial property (5.8b), we can 
choose an orthonormal basis forX, a basis ft] H N such that 

*r*j=Sij- (5.13a) 

For every vector x in X , we can write 

N 

*=£*;*« (5.13b) 

i=i 

with 

x,=ft;x. (5.13c) 

Equations (5.2) and (5.13c) imply 

(x/> = 0, (5.14a) 

while (5.8), (5.13c) and (5.13a) imply 

(X'Xj) = Sij . (5.14b) 

Therefore, there is a basis in X which represents the vectors x and X in terms of coordinates 
X\, ...,x N which are uncorrelated random variables, each with zero mean and unit variance. 

The dot product on X makes X a Euclidean space with a volume element d N x, an impossi- 
bility when dimX =°° (Loewner, 1939). If p x has a density function / with respect to this 
volume element, i.e., if 

dp x (*) = d N xf (x) , (5. 15a) 

and if 

/ (x) = (2ff) -/v/2 exp(-||x|| 2 /2] (5. 15b) 

the p x is called a gaussian distribution. For a gaussian, if i * j then x t and Xj are not only 
uncorrelated but independent, and not only have zero mean and unit variance but ate identically 
distributed. 

The dot product (5.7b) is the natural one to use on X when studying p x . If<f is any fixed 
unit vector in X , (5.8b) shows that 
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«fx) 2 >=l, (5.16) 

so (5.7b) measures each component of x in units of the standard deviation of that component. 
When X is the data space Y , and p x is the probability distribution p R for the random error S R y in 
the data vector y, then (5.1b) says that (5.7b) measures each component £ • y of y in units of its 
error of measurement, the standard deviation of £ • S R y 

The foregoing calculations take a familiar form when X is the data space Y, the data vectors 
being y=(y\ ...,y d ). The dual space Y consists of the dxl column matrices y T =(?j 
where T means matrix transpose. The value of $ at y is 

y(y)=yf. (5.17) 

matrix multiplication being intended. The natural basis for Y is b t , where b, is the 1 xd 
matrix whose i dr column is 1, the others being zero. The coordinate functional b‘ is b; r . Then 


so (5.2b) implies 

y =/*>«• 

(5.18a) 

and (5.11a) gives 

o 

II 

/N 

(5.18b) 

while (5.12) is 

•■s 

n 

;5» 

> 

(5.18c) 

yi-yi=yiKjyi 

with V{j defined by (5.11b) as (V -1 )' 2 . In terms of the datay \ ■■■,y d ,p R isgaussianif 

(5. 1 8d) 


dp R (y) = (arr^exp [-V4y V V tj ]dy 1 dy d . 

(5.19) 


Now we return to (5. 14) and the hardening of soft bounds. If;t],...,% are not only uncorre- 
lated but independent, and have not only the same mean and variance but the same one- 
dimensional probability distribution, with fourth moment +1, then, as we have seen in section 
3, the central limit theorem implies that with probability more than 0.997, x satisfies 

lf'r 1 ||x|| 2 — II Z3(KINj*. ( 5 . 20 ) 

In (5.20) are two inequalities, or hard bounds, namely 

7T l ||x|| 2 < 1 + 3 (K/N)'* (5.21a) 

A^llxll^l-Sttf/JV)*. (5.21b) 
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If we use only (5.21a), we have discarded at least half the information contained in p x . If we do 
so, then for large N we can neglect (KIN)* in (5.21a) and write (5.21a) as 

W -1 ||x|| 2 < 1 ; (5.22) 

but in the same approximation we can write (5.20) as 

AT 1 ||x|| 2 =1. (5.23) 

In fact, if we replace the soft bound p x by the hard bound (5.21a), we have discarded much 
more than half the information in p x , because for any n such that 1 «n <N, with probability 
0.997 we will have 

n - 1 X*i 2 -1 

i-l 

and these inequalities do not follow from (5.20). In short, the hardening of the soft bound p x to a 
single quadratic inequality discards most of the hard information in p x . Soft bounds contain 
much more information than the corresponding hard quadratic bounds. 

6. Conclusions 

From the algebraic structure of the linear inverse problem, it is clear that without prior informa- 
tion about the correct earth model x E , the prediction vector z is not usefully limited by the data 
vector y unless the prediction functionals are mere linear combinations of the data functionals. 
This conclusion does not require a topology on the model space X , much less a norm or an inner 
product. Therefore, when there are prediction functionals which are not linear combinations of 
the data functionals, the linear inverse problem is insoluble without prior information about x E . 

The present paper compares two forms of prior information about x E . One is a prior per- 
sonal probability distribution p x for x E in X , a "soft bound" on x E . The other is a quadratic ine- 
quality 

Qx( x E’ x e) - 1 » (6.1) 

a "hard bound." Energy constraints are examples of hard bounds. 

We show that a hard bound can be "softened" to many different probability distributions p x , 
but all these p x ’s carry large amounts of new information about x E which is not present in (6.1). 
For example, if dimX =«> then p x assigns probability zero to the set of all earth models x E for 


<3(K/n)*, 


(5.24) 
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which Q x (x E ,x E ) >s finite. When dimX is very large but finite, p x assigns veiy small probabil- 
ity to the troth of (6.1), despite the fact that p x is supposed to represent a "fuzzy" version of (6.1). 
In the inverse problem of downward continuation of the geomagnetic field B, softening the core 
heat flow bound with a p x which treats appropriate multiples of the gauss coefficients as indepen- 
dent, identically distributed random variables will lead an observer to convert his estimate of a 
bound on the ohmic heat production rate in the core to a belief that this rate is infinite. 

The same situation is encountered in reverse when we try to "harden" a probability distribu- 
tion p x to a quadratic inequality (6.1). Here p x generates a positive definite quadratic form Q x 
for which (6.1) is true with high probability. However, p x implies that many other quadratic ine- 
qualities for x E are true with high probability, and all this information is lost when p x is replaced 
by (6.1). 

If the data vector y is to be inverted by means of Bayesian inference or stochastic inversion, 
the prior information about x E must be supplied in the form of a probability distribution p x . If 
there is objective evidence or a theoretical basis for p x , or if p x is a hypothesis to be tested, then 
all the prior information about x E carried in p x is legitimate, and an effective inversion will use 
it However, if p x is obtained by softening a hard quadratic bound (6.1), and dimX » 1, then p x 
contains so much more information than (6.1) that stochastic and Bayesian inversions based on 
p x would appear to be suspect. If the prior information is a hard quadratic bound (6.1), the pre- 
ferred technique for incorporating that information into a data inversion would appear to be hard 
quadratic inversion (HQI), the multidimensional analogue of the method of confidence intervals 
(Kendall & Stuart, 1979). HQI was explored briefly by Backus (1970a). Work in progress will 
discuss further details, including resolution, incorporating systematic errors, and questions of 
computational efficiency. 
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